Skip to main content

Short-Term Memory

Short-term memory keeps track of the current conversation. It's what lets your agent remember "My name is Alice" three messages ago, and what prevents context windows from overflowing in long conversations.

The Problem​

LLMs have limited context windows. As conversations grow:

Turn 1:  User: "My name is Alice"          ← 10 tokens
Turn 50: User: "What's my name again?" ← 50,000 tokens total
Agent: "I don't know" ← context overflow, lost earlier messages

Naive solutions have tradeoffs:

ApproachProblem
Keep everythingExceeds context window, costs explode
Drop old messagesLoses important context ("My name is Alice")
Fixed sliding windowArbitrary cutoff, may drop critical info

SpoonOS Solution​

ShortTermMemoryManager intelligently manages context:

FeatureWhat It Does
Token-awareTracks actual token count, not message count
Smart trimmingMultiple strategies: oldest first, from start, from end
SummarizationCondenses old messages into a summary when needed
Built-inChatBot handles this automaticallyβ€”no extra code

When To Use​

  • Chatbots with multi-turn conversations
  • Agents that need to remember earlier context
  • Long sessions that would exceed context limits
  • Cost optimization to reduce token usage

Quick Start​

pip install spoon-ai
export OPENAI_API_KEY="your-key"
import asyncio
from spoon_ai.chat import ChatBot

# ChatBot includes built-in short-term memory with auto-trimming
llm = ChatBot(model_name="gpt-5.1-chat-latest", llm_provider="openai")

async def main():
await llm.ask([{"role": "user", "content": "My name is Alice"}])
await llm.ask([{"role": "user", "content": "What's the capital of France?"}])
response = await llm.ask([{"role": "user", "content": "What's my name?"}]) # Remembers "Alice"
print(response)

asyncio.run(main())

ShortTermMemoryManager​

For fine-grained control beyond ChatBot's automatic handling, use ShortTermMemoryManager directly:

import asyncio
from spoon_ai.memory.short_term_manager import ShortTermMemoryManager, TrimStrategy
from spoon_ai.schema import Message

manager = ShortTermMemoryManager()
history = [
Message(id="u1", role="user", content="Hello!"),
Message(id="a1", role="assistant", content="Hi there β€” how can I help?"),
Message(id="u2", role="user", content="What's DeFi?"),
]


# Trim the message list by token budget
trimmed = asyncio.run(
manager.trim_messages(
messages=history,
max_tokens=48,
strategy=TrimStrategy.FROM_END,
keep_system=True,
)
)

# Summarize history before model call
llm_ready, removals, summary = asyncio.run(
manager.summarize_messages(
messages=history,
max_tokens_before_summary=48,
messages_to_keep=2,
summary_model="anthropic/claude-3.5-sonnet",
llm_manager=chatbot.llm_manager,
llm_provider=chatbot.llm_provider,
existing_summary=chatbot.latest_summary() or "",
)
)

llm_ready β€” condensed history you can pass to the LLM removals β€” list of RemoveMessage directives:apply removals to your persisted history using spoon_ai.graph.reducers.add_messages.

Note: both summarize_messages() and ChatBot.ask() invoke your configured LLM. Ensure chatbot.llm_manager/chatbot.llm_provider (and any required API keys or env vars) are set so these examples can run end‑to‑end.

from spoon_ai.chat import ChatBot
from spoon_ai.graph.reducers import add_messages

chatbot = ChatBot(enable_short_term_memory=True)
history = [
Message(id="u1", role="user", content="Hello!"),
Message(id="a1", role="assistant", content="Hi there β€” how can I help?"),
Message(id="u2", role="user", content="What's DeFi?"),
]
# Remove the latest assistant message
assistant_ids = [msg.id for msg in history if msg.role == "assistant"]
remove_last = chatbot.remove_message(assistant_ids[-1])

# Or clear the entire history
remove_all = chatbot.remove_all_messages()

# Apply directives to persisted history
updated_history = add_messages(history, [remove_last])
cleared_history = add_messages(history, [remove_all])

add_messages() merges the removal directives into the existing history, deleting targeted entries (or the entire transcript). This mirrors how the short-term memory manager emits RemoveMessage items when summarization trims older turns.


Inspecting Thread State and Checkpoints​

every time the graph runs, you can retrieve the latest snapshot (messages plus metadata), iterate the full checkpoint history, or read a CheckpointTuple for an external consumer. This makes it easy to debug memory behaviour, replay from any checkpoint, or sync state to persistent storage. The example below shows how to fetch the most recent summary, list all checkpoints, and view the tuple-style payload.

config = {"configurable": {"thread_id": "memory_demo_thread"}}

snapshot = graph.get_state(config)
print("Latest checkpoint:", snapshot.metadata.get("checkpoint_id"))

for snap in graph.get_state_history(config):
print("History id:", snap.metadata.get("checkpoint_id"))

checkpoint_tuple = graph.checkpointer.get_checkpoint_tuple(config)
print("Checkpoint tuple:", checkpoint_tuple)

for entry in graph.checkpointer.iter_checkpoint_history(config):
print("Tuple history entry:", entry)

If you are using a compiled graph (CompiledGraph), call graph.graph.get_state(config) and graph.graph.get_state_history(config) instead; the snippet above assumes graph is a StateGraph.